perm filename FOO.BAR[VLI,LSP] blob sn#377773 filedate 1978-09-08 generic text, type T, neo UTF8
00050	@device(xgp)
00100	@make<report>
00200	@Counter(SubParagraph,Within Paragraph,TitleEnv HD4,ContentsEnv tc4,
00300		  Numbered [#1.],Referenced [#.1],IncrementedBy Use,Break)
00400	
00450	@PageHeading(left "@c[Oleinick's Thesis]", right "@c[Preliminary]")
00500	@Chapter<Introduction>
00600	The purpose of this research is to demonstrate how to write parallel
00700	programs that effectively use the multiple computers in a
00800	multiprocessor.  Developing strategies for incorporating parallelism
00900	into algorithms has been an area of intense interest for quite some
01000	time, e.g., [Avriel and Wilde 66], [Karp and Miranker 68], [Rosenfeld and Driscoll 69],
01050	[Heller 76], [Hyafil and Kung 76], [Thompson and Kung 76], [Baudet, Brent and Kung 77]
01075	and [Baudet 78].  However, until very recently,
01100	only simulation and analysis techniques were available for
01200	demonstrating the effectiveness of a parallel algorithm.
01300	
01400	With the emergence of multiprocessor computer systems that provide
01500	users with the facilities for constructing parallel algorithms[CM*
01600	and C.mmp], the verification of an algorithm's performance is in its
01700	implementation.  Initial attempts to demonstrate the performance of
01800	a simple parallel algorithm [Fuller 75] yielded unexpectedly large
01900	degradations in the algorithm's performance.  These degradations were
02000	not the result of an error or inefficiency in decomposing the problem
02100	into cooperating processes.  Rather, several non-algorithmic sources
02200	were determined to be the source of the degradations.  This result
02205	indicates that in order to develop effective parallel algorithms for
02207	multiprocessors it is necessary to be aware of the target machine's
02208	performance characteristics.
02220	
02500	Presently, the
02600	task of writing effective parallel software is an ad-hoc
02700	procedure of constructing code for a unique machine.  Since
02800	multiprocessors are almost as different from one another as they are
02900	from uniprocessors, it is difficult to apply insight gained from writing
03000	parallel software for one multiprocessor to another
03100	machine.  However, by documenting the performance of various implementations
03200	of several algorithms on one machine, we can demonstrate the
03300	effectiveness of various strategies at capturing parallelism.
03400	
03500	One style of parallel programming for multiprocessors involves tightly
03600	coupled cooperating processes.  Several decomposition strategies exist
03650	that use this approach, among them @i[pipelining] and @i[partitioning]
03675	[Jones 78].  In both cases, simultaneously
03700	executing processes must interact frequently.  Since inter-process
03800	communication constitutes an overhead, tightly coupled systems exhibit
03900	performance degradations proportional to the amount of process
04000	interaction among the processes.  Thus, in order to maintain high
04100	performance, one must reduce both the overheads of inter-process
04200	communication and the amount of process interaction.
04300	
04400	The user has little power to reduce the overhead of
04500	inter-process communication.  Since processes are created and
04600	maintained by the operating system, inter-process communication is
04700	permitted in only a few well defined ways.  The user is given a
04800	selection of primitives provided by the operating system, with which he can
04900	build his own communication mechanisms.  However, the
05000	performance of the communication mechanism is directly influenced
05100	by the performance of the operating system's primitive.
05200	
05300	However, writing
05400	effective parallel software requires an awareness of more than just
05500	the overheads involved in inter-process communication.
05600	We adopted the following two phase strategy for uncovering the major
05700	influences on performance:
05800	@begin(enumerate,group)
05900	
06000	Develop a simple parallel algorithm as a vehicle for
06100	conducting a performance study on C.mmp.
06200	
06300	Use this test program to measure the effects on performance
06400	stemming from both the hardware and the operating system.
06500	@end(enumerate)
06505	A brief introduction to the C.mmp environment, both hardware and
06510	operating system is contained in chapter two.  In addition, chapter
06515	two contains the development and theoretical performance calculations
06520	of the rootfinding algorithm.
06525	
06600	The investigation into the sources of performance perturbation is
06700	presented in chapter three.
06800	
06805	Since synchronization is a fundamental parallel programming issue,
06810	chapter four is devoted entirely to studying the effects of
06815	synchronization on performance.  The performance of various
06820	synchronizaton primitives is conducted at two levels:  the speed
06825	in performing the basic synchronization operations and the impact
06830	each primitive has on the performance of the rootfinding algorithm.
06835	
06900	In chapter five, we apply the insights gained from the
07000	initial investigation toward developing a technique for efficiently
07100	implementing tightly coupled complex systems.  This technique involves
07200	decomposing a problem into @i[task forces][Jones 77] of cooperating
07300	processes.
07400	
07500	Chapter six contains the implementation and evaluation of a complex
07600	task-- the HARPY speech recognition system.
07700	An initial decomposition of
07800	the algorithm is successively refined in three implementations.  In
07900	each iteration, some aspect of performance is improved.  This incremental
08000	enhancement of the algorithm enables us to measure the performance
08100	improvement contributed by each enhancement.
08105	
08110	Chapter seven contains a summary of the investigations into the
08115	implementation of parallel algorithms.  In the conclusion, the major results and
08120	contibutions of the thesis are summarized.